Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

نویسندگان

  • Mariusz Lapczynski
  • Bartlomiej Jefmanski
چکیده

Building predictive models in customer relationship management refers to each stage in the customer’s lifecycle, i.e. the customer acquisition, development and retention. One may notice that the construction of predictive models is more and more frequently accompanied by an attempt to combine analytical tools of the same type or combining various methods. The first approach is named “ensemble methods”, while the second can be referred to as a “hybrid” one. Authors decided to combine k-means algorithm with decision trees and examine whether cluster validity measures influence the performance of a model. During experiments 5 different cluster validity indices and 8 datasets were used. The performance of models was evaluated by using popular measures such as: accuracy, precision, recall, G-mean, F-measure and lift in the first and in the second decile. The results are far from causing enthusiasm, however, they are promising in some fields.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hybrid DEA-based K-means and invasive weed optimization for facility location problem

In this paper, instead of the classical approach to the multi-criteria location selection problem, a new approach was presented based on selecting a portfolio of locations. First, the indices affecting the selection of maintenance stations were collected. The K-means model was used for clustering the maintenance stations. The optimal number of clusters was calculated through the Silhou...

متن کامل

An Efficient Predictive Model for Probability of Genetic Diseases Transmission Using a Combined Model

In this article, a new combined approach of a decision tree and clustering is presented to predict the transmission of genetic diseases. In this article, the performance of these algorithms is compared for more accurate prediction of disease transmission under the same condition and based on a series of measures like the positive predictive value, negative predictive value, accuracy, sensitivit...

متن کامل

GROUND MOTION CLUSTERING BY A HYBRID K-MEANS AND COLLIDING BODIES OPTIMIZATION

Stochastic nature of earthquake has raised a challenge for engineers to choose which record for their analyses. Clustering is offered as a solution for such a data mining problem to automatically distinguish between ground motion records based on similarities in the corresponding seismic attributes. The present work formulates an optimization problem to seek for the best clustering measures. In...

متن کامل

Fuzzy Modeled K-Cluster Quality Mining of Hidden Knowledge for Decision Support

Problem statement: The work presented Fuzzy Modeled K-means Cluster Quality Mining of hidden knowledge for Decision Support. Based on the number of clusters, number of objects in each cluster and its cohesiveness, precision and recall values, the cluster quality metrics is measured. The fuzzy k-means is adapted approach by using heuristic method which iterates the cluster to form an efficient v...

متن کامل

A Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS

Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013